Tata Consultancy Services Linguistic Data Consortium for Indian Languages (LDC-IL)

نویسندگان

  • Michael Zock
  • Reinhard Rapp
چکیده

Lexical networks can be used with benefit for semantic analysis of texts, word sensedisambiguation (WSD) and in general for graph-based Natural Language Processing.Usually strong relations between terms (e.g.: cat --> animal) are sufficient to help for thetask, but quite often, weak relations (e.g.: cat --> ball of wool) are necessary. Ourpurpose here is to acquire such relations by means of online serious games as otherclassical approaches seems impractical. Indeed, it is difficult to ask the users (nonexperts) to define a proper weighting for the relations they propose, and then we decidedto relate weights with the frequency of their propositions. It allows us to acquire first thestrongest relations, but also to populate the long tail of an already existing network.Furthermore, trying to get an estimation of our network by the very users thanks to a tipof the tongue (TOT) software, we realized that they rather tend to favor the relations ofthe long tail and thus promote their emergence. Developing the long tail of a lexicalnetwork with standard and non-standard relations of low weight can be of advantage fortasks such that words retrieval from clues or WSD in texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

More Data and Tools for More Languages and Research Areas: A Progress Report on LDC Activities

This presentation reports on recent progress the Linguistic Data Consortium has made in addressing the needs of multiple research communities by collecting, annotating and distributing, simplifying access and developing standards and tools. Specifically, it describes new trends in publication, a sample of recent projects and significant improvements to LDC Online that improve access to LDC data...

متن کامل

LDC Language Resource Papers: Building a Bibliographic Database

The Linguistic Data Consortium (LDC) creates and provides language resources (LRs) including data, tools and specifications. In order to assess the impact of these LRs and to support both LR users and authors, LDC is collecting metadata about and URLs for research papers that introduce, describe, critique, extend or rely upon LDC LRs. Current collection efforts focus on papers published in jour...

متن کامل

Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report

Changes in the supply of and demand for language resources continues to affect the role of large data centers such as the Linguistic Data Consortium (LDC) and European Language Resource Center (ELRA) within the research communities they serve. The past few years have seen increased demand for: intensively multi-modal resources, larger data sets in high-density languages and new data in low dens...

متن کامل

MACROPHONE: An American English Telephone Speech Corpus

Macrophone is a corpus of approximately 200,000 utterances, recorded over the telephone from a broad sample of about 5,000 American speakers. Sponsored by the Linguistic Data Consortium (LDC), it is the first of a series of similar data sets that will be colected for major languages of the world in a cooperative project called Polyphone. It is designed to provide telephone speech suitable for t...

متن کامل

The Creation, Distribution and Use of Linguistic Data: the Case of the Linguistic Data Consortium

The Linguistic Data Consortium (LDC) is an open consortium of universities, companies and government research laboratories. It creates and distributes speech and text databases, lexicons and other resources. The University of Pennsylvania is the LDC’s host institution. The LDC was founded in 1992 with a grant from the Defense Advanced Research Projects Agency (DARPA). Currently, all LDC publica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012